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§1  Introduction. 


In  the  problem  of  estimating  the  covariance  matrix  of  a  multivariate  normal 
population  the  usual  estimator  is  the  sample  covariance  matrix  S  —  A/n  where  A  is 
distributed  according  to  the  Wishart  distribution  W{E,n).  Although  5  is  unbiased  it  is 
known  that  the  (sample)  characteristic  roots  of  5  tend  to  be  more  spread  out  than  the 
corresponding  (population)  roots  of  27.  This  can  be  seen  as  follows.  Let  be  the  largest 
characteristic  root  of  27  and  £  be  an  associated  unit  characteristic  vector,  then 


(U)  Xa  =  £'27£,  =  l. 

Ls  Unb5ased  for  Xi  since  =  t’SWt  =  =  Xt.  On  the  other 

largest  sample  root  lv  can  he  written  as 

(I*'*)  h  =  max  x'  Sx: 

x'x=i 


hand  the 


hence 


=  £(jnax(  *'5x)  >  £ {£'S£)  —  Xt. 

See  Van  der  Vaart(L961),  Anderson(1963).  Similarly  for  the  smallest  roots  \p,f.p  of  27,  S, 
respectively,  we  have  £{ip)  <  \p.  It  is  these  implicit  maximization  and  minimization 
processes  for  each  observed  S  that  makes  the  sample  roots  more  spread  out  than  the 
population  roots.  Actually  in  terms  of  rnajorization  the  following  holds:  . .  ,  £ (t!p)) 

majorizes  (Xt,...,Xp).  See  Chapter  12,  Section  E  of  Marshall  and  01kin(1979). 

The  above  consideration  suggests  that  we  shrink  the  sample  roots  toward  a  middle 
value.  This  is  analogous  to  the  Stein-type  estimation  of  a  multivariate  norma!  mean. vector. 
Earlier  works  along  this  direction  can  be  found  in  Stcin(1975),  Efron  and  Morris(1976), 
Haff(1977a,b,  1979, 1980),  Eaton(1970),  Sugiura  and  Fujimoto(1982)  and  others. 

Another  approach  was  taken  in  Stein(1956),  James  and  Stcin(l961),  Selliah(1964), 
and  Olkin  and  Selliah(1977).  They  are  concerned  with  minimax  estimation  of  the  covariance 
matrix.  Minimax  estimators  can  be  obtained  by  considering  the  best  invariant  estimator 
with  respect  to  the  triangular  group  C+  (the  group  consisting  of  lower  triangular  matrices 
with  positive  diagonal  elements).  An  unappealing  property  of  these  estimators  is  that  they 
depend  on  the  coordinate  system. 


2 


In  this  paper  we  propose  an  orthogonally  invariant  minimax  estimator  which  is 
derived  from  the  minimax  estimators  above  by  averaging  them.  The  idea  of  averaging  al¬ 
ready  appears  in  Stein(1956)  and  the  specific  estimator  proposed  below  is  briefly  mentioned 
in  Eaton(1970)  (his  formula  3.6).  But  it  seems  to  have  never  been  studied  carefully. 

In  Section  2  we  derive  the  estimator  and  study  its  properties.  Details  of  com¬ 
putation  are  given  in  Section  4.  For  dimensions  2  and  3  the  estimator  is  given  explicitly. 
For  larger  dimensionalities  the  explicit  integration  involved  seems  formidable.  The  Monte 
Carlo  method  is  always  available,  but  some  good  approximation  is  desirable.  In  Section  3 
we  study  the  risk  behavior  of  the  estimator.  If  the  number  of  degrees  of  freedom  is  not 
too  large  compared  to  the  dimensionality,  it  shows  a  substantial  improvement  over  the 
minimax  estimator  mentioned  above  for  a  wide  range  of  population  covariance  matrices. 


§2  Derivation  of  the  estimator. 

Suppose  that  a  sy  me  trie  p  X  p  random  matrix  A  is  distributed  according  to 
W(27,n)  and  consider  the  problem  of  estimating  27  with  the  following  loss  functions: 

LV(S,  27)  =  tr^i;-1)  -logdct^i;-1)-*), 

(2-1)  L2(27,  27)  =  tr(i727“1  -  i)2. 


For  these  loss  functions  the  best  estimators  among  the  scalar  multiples  of  A  are 
given  by  A/n,  A/(n  +  p  +  1)  for  L1;  L2  respectively  (see  Ilaff(1980)  for  example).  Although 
these  estimators  have  a  constant  risk,  they  are  not  minimax.  Minimax  estimators  were 
obtained  by  considering  the  best  invariant  estimator  with  respect  to  Gj-.  They  are  of  the 
form 


(2.2)  27(A)  =  TDT', 

where  D  =  diag(d1, . . . ,  dp)  and  TGGj  with  TT'  =  A. 
For  Li 


di  = 


n  +  p ■■+  1  —  2*  ’ 
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See  Stein(1956),  James  and  Stein(1961).  For  L2,  d  =  [du  . . . ,  dp)>  is  given  by 

(2-4)  d  =  F~1f, 

where  F=  (/*),/=(/,•)  and 

hi  =  (n  +  p  -  2i  +  l)(n  +  p  -  2i  +  3), 

(2.5)  fij  =  f]i  =  n  +  p-2j +  1,  i  <  j, 

fi  =  n  +  p  —  2i  +  1. 


See  Selliah(1964),  Olkin  and  Selliah(1977). 


Note  that  for  L,  we  have  dx  <  •••  <  dp.  The  same  ordering  seems  to  hoid 
for  L2  as  well.  This  causes  a  rather  unpleasant  asymmetry  of  the  estimator:  the  first  few 
rows  and  columns  become  smaller  compared  to  others.  This  asymmetry  leads  to  the  idea  of 
symmetrizing  the  estimator  by  averaging  over  different  coordinates.  Let  Tbe  an  orthogonal 
matrix  corresponding  to  a  change  of  orthonormal  bases.  In  the  new  coordinates  A  ,  17  are 
written  as  r'AT,  r'Sr  respectively.  We  estimate  r' ZF  by  the  above  method,  namely 

(2-6)  r!zr=  tvdtv', 


where 


TrTv'  =  r'AT. 

Returning  to  the  original  coordinates  we  have 


=  rTrDTr'r'. 

This  gives  an  estimator  different  from  (2.2).  Furthermore  since  the  loss  functions  are  fully 
invariant,  Zr  has  the  same  constant  minimax  risk  as  £  .  Now  let  ft,  be  a  probability 
distribution  on  0(p),  the  group  of  p  X  p  orthogonal  matrices.  Because  of  the  (strict) 
convexity  of  the  loss  functions  we  obtain  an  improved  estimator  by  averaging 


(2.7) 


=  /  rrrDTr'r'dfx(i). 

Jo(p)  ^  1 


Note  that  is  minimax  being  an  improvement  over  a  minimax  estimator.  Interesting  cases 
are  (i)  p:  the  uniform  distribution  of  permutation  matrices,  (ii)  p:  the  uniform  distribution 
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(Ilaar  measure)  on  0(p).  These  cases  are  briefly  mentioned  in  Stein(  1956,  formula  4.13), 
Eaton(1970,  formula  3.6),  respectively.  See  Sharma(1980)  too.  For  this  paper  we  consider 
the  uniform  distribution  on  0(p)  and  study  the  resulting  estimator 

(2.8)  i)0(A)=  f  rTvDTr'r'dr, 

Jo[p) 

where  d-F=  dp(r).  From  the  invariance  property  of  the  uniform  measure  we  have 
Lemma  2.1.  (Orthogonal  invariance  of  E q.)  For  any  orthogonal  T 

(2.9)  e0  {rAr')  =  r£0(A)r'. 


Lemma  2.2.  If  A  is  diagonal  then  Eo(A)  is  diagonal. 

Proofs  are  straightforward  and  omitted.  Now  let  A  =  -ToA/o'  where  JTo  is 
orthogonal  and  A  is  diagonal.  Then 

(2.10)  £o(A)  -  E0iraArQ')  =  r0Eo{A)rQ' 

and  Eo(A)  is  diagonal.  We  see  that  Eq  modifies  only  the  characteristic  roots  of  A.  For 
notational  convenience  we  define  <f>i, . , . ,  <f>p  by 

(2.11)  diag(0i  («),...,  <£p(a))  =  ^o(diag(«! , . . . ,  ap)), 


where  a  =  («i, . . . , ap).  We  are  interested  in  the  behavior  of  if>\, .. .  We  will  sec  the 
shrinking  of  the  roots  mentioned  in  the  introduction.  Let  us  look  at  the  simplest  case 
P  =  2- 


Theorem  2.1.  For  p  =  2 


(2.12) 


<M«t,a2)  —  “ici  ~  <*i 


<f>2[oii,a2)  =  a2c2  —  a  2 


/ 

+  \/«2 

r  _ 

\A*i  +  \/a2 


d  i  + 


d  i  + 


Val  +  V«2 

^  j  \ 

/ —  I  / — 

sJCLx  +  \/«2 


Note  that  as  ayjcx2  approaches  co,  0x~atdi,  cj)2~a2d2.  Now  for  L\  we  have 
di  =  l/(n  +  1)  <  1/n  <  d2  =  1  /(«.  —  1).  This  shows  that  if  ai  a2  then  the  larger  root 


2.  Derivation  of  the  estimator. 


5 


is  shrunk  and  the  smaller  root  is  expanded  relative  to  the  unbiased  case  5=  A/n.  When 

<*i  =  £*2  =  a  then  <f>t  =  <f>2=a(di  +  d2)/ 2.  The  shrinking  factors  ci,c2  change  smoothly 
between  these  two  cases. 

For  p  =  3  the  integration  over  0(3)  is  already  tedious.  We  give  an  infinite  series 
expression  for  Convergence  is  reasonably  fast  but  the  form  of  the  series  is  not 

very  revealing.  Let  (a)*  =  a(a  +  !)•  • .  (a  +  k  —  1)  and  let 


CO  oo 

Fl  («;  b,  b';  c;  x,  y)  =  Y2  Hl, 

m.—- o  n=0 


(^Om+Tl((,)m(fr,)*'^ 

rn!n!(c)m+n 


be  Appell’s  hypergeometric  function  of  two  variables.  Furthermore  for  convenience  let 
H{b,  b'\  c;  x,  y)=Fi(  1;  b/2,  b' / 2;  c/2\x,y). 


Theorem  2.2.  Let  p  =  3  and  0  <  a  <  fi  <  1.  Then 
(2J3) 

^(1,1  -a,  l  -  3)  =  _  (d3-di)(a  +  0)  _  d2-di 


—  ^2 


15 


JI(l,l;5;a,/3) 


®  H&> 3;  ft  ft  +  3«2(1  -  ^)«(5, 3;  9;  0,  7)  +  3/?»(a  -  «)//(5, 1;  9;  ft,  7)}, 


wAere  7  ==  (/3  -  a)/(l  -  a), 

(2.14) 

dj  +  d2  4-  f/3 


^2(1, 1  —  a,  l  —  /3)  = 
d2  —  d\ 


3 


a 

15 


(7dj  +  5d2  +  3  c/3)  —  — (rf3  —  dt) 
10 


105  ftSa  9i  0)  +  2at(a  -  /?)//( 3,  3;  9;  a,  /?)  +  3(a  -  /?)2//(3,  5;  9;  a,  /?)} 

I0#(1  -2a){3(a  ~  ft*1  ft1’ 5;  ^  ft  ft  +  3a;2(1  ~  5;  9;  0, 7)  +  /?2(1  -  a)f/ (3, 3;  9;  0, 7)}, 


^s(l>  1  ft,  1  /?)  —  (t  P){d\  +  d2  +  d2  —  <f>  1  —  <f)  2/(1  —  a)}. 

Note  that  these  formulas  suffices  for  all  cases  because  the  estimator  is  invariant 
with  respect  to  scale  change  and  permutation  of  roots. 

In  order  to  illustrate  how  behave  let  n  =  6, p  =  3,  eft-  =  1/(10  —  2 *), 

S  —  A/n  =  diag(l,  a, 0).  Values  of  for  several  values  of  a  and  0  are  given  as 

follows. 
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a 

P 

<f>i 

<f>  2^2) 

^3(^3) 

1.0 

1.0 

1.083 

1.083(1.08) 

1.083(1.08) 

1.0 

.7 

1.065 

1.065(1.07) 

.783(1.12) 

1.0 

.5 

1.049 

1.049(1.05) 

.576(1.15) 

1.0 

.1 

.976 

.976(.98) 

.130(1.30) 

.7 

.7 

1.048 

.771(1.10) 

.771(1.10) 

.7 

.5 

1.032 

.758(1.08) 

.567(1.13) 

.7 

.1 

.961 

.703(1.00) 

.128(1.28) 

.5 

.5 

1.016 

.559(1.12) 

.559(1.12) 

.5 

.1 

.948 

.517(1.03) 

.127(1.27) 

.1 

.1 

.889 

.118(1.18) 

.118(1.18) 

Note  that  n[d\  +  d%  +  t/3 )/^  =  1.083  seems  to  have  an  overall  effect.  When  this 
overall  multiplicative  constant  is  taken  into  account,  then  the  shrinking  effect  is  evident. 
Another  thing  of  interest  is  that  the  middle  root  seems  to  be  “neutral”  having  this  overall 
shrinking  factor  (1.083)  when  S  ass  diag(l,  a,  a2).  The  case  a  —  .7,  P  =  .5  in  the  above 
table  is  illustrative. 

For  general  dimension  p,  p(p  —  l)/2-fold  integration  is  involved.  Although  infinite 
series  expression  as  in  Theorem  2.2  is  always  possible  in  principle,  it  will  be  complicated 
and  convergence  might  be  slow.  Then  a  Monte  Carlo  method  can  be  used.  We  will  discuss 
this  in  Section  3  and  Section  4.  Here  we  give  a  qualitative  description  of  the  estimator. 

Note  that  D  can  be  written  as  D  —  dxEn-\ - +dpEpp  where  E{i  has  1  in  th  position 

and  0  everywhere  else.  Putting  this  into  (2.8)  we  see  that  ..,<f>p  are  linear  in  dX).  ..,dp. 
Therefore  we  can  write 

(2.16)  <S k{ot)  ~  otiCi  —  ai(wndi  -\ - +  wipdp),  i—  t, . . . ,p. 

Let  W(cx)  =  (^(cx)). 

Theorem  2.3.  W(a)  is  doubly  stochastic,  namely  >  0,  ]TA  u;ty  =  1,  J2iwij  —  1* 
Proof:  Let  et-  denote  a  vector  with  1  as  i-th  element  and  0  everywhere  else.  Now 


<j>i(ct)  —  e^o^iagoijej, 


3.  Risk. 


hence 


(2.17) 


otiWij  =  /  e^rTi'EjjTr'  r'  e^dr 

J0(p ) 

=  low  <rr'‘AT'‘r'^r 

=Ljrr^dr^a- 


' o(p ) 

Hence  wtJ(ot)  >  0.  Now  consider  the  special  case  D  —  I.  Then 

^o(A)  =  J  rTrTr'r'dr=  J  rr,Arr/dr=A. 

Hence  &(ex)  =  oq.  This  implies  wfl  +  •••  +  u;ip  =  1  for  i  =  l,...,p. 
consequence  of  the  following  lernrna. 

Lemma  2.3. 


=  1  is  a 


(2.18) 


Proof: 


(2.19) 


tr(270(A)A~l)  =  tr  £>. 

tr(  J  rTvDTv'r'drA-1) 

=  f  tr ( JTr  D 2p ' T'A~  l)dr 

-  J  tr(rrL>2V,(TrTr/)-,)dr 

=  J  tr  DdP  =  IrD. 


Remark  2.1.  The  estimator  is  characterized  by  the  shrinking  factor  a  and  in  turn 
is  characterized  by  wi3  in  (2.17).  However  because  of  the  lack  of  linearity  (2.17)  seems 
hard  to  analyze  in  general.  For  example  it  is  not  even  dear  if  A  >  B  (:A  —  B  is  positive 
semidefinite)  implies  270 (A)  >B0(B) 


§3  Risk. 

I1  or  L i  the  risk  can  be  evaluated  in  the  following  fairly  simple  form  and  gives  a 
nice  qualitative  understanding  of  its  behavior. 


Theorem  3.1. 


$ 


RX{E,E  o)~£ 


(3.1) 


=  ~Y1  S\ogCi(a).-plog2  -  - ")• 

i=i  i=i 


where  Ci(a)’s  are  the  shrinking  factors  and  ip(a)  =  r'(a)/F(a). 


Proof :  Wc  look  at  tr  27  1 27  term  first. 


(3.2) 


£c{tr  27~1£0}  =  <?s(tr  27-1  J  rTvDTv  T'dl) 
=  J  £s(tr  r'27“l ITpUTr  ')dr 
=  J  £s.(tr  E*-lTDT')dr 
=  J  £KK>{tr  T'K'~ 1 KT 1 TD) dr 
=  J  £/(tr  T'TD)dr 

p 

^  ^  dj  £  Xn+P_ 2i+l 
1  =  1 
=  p- 


where  27*  =  r'27.T,  T,  K  £  Gj  with  3T'  =  A,  IOC'  =  27*.  Therefore 

fs(tr  27-1270  —  logdet27-1270)  -  p 
=  —£s  (log  det  27_1270) 

p 

=  -£s(  l°g(«iei)  -  log  det  27) 

i—  1 
P 

(3.3)  —  -^e(X)  logc*  +  log  det  A  —  log  det  27) 

1=1 

=  logC<)  -  Y1  loSXn+i -i 

i=l  i=l 

=  -  £  logci  -plog2  -  2  ^(^i — -)■ 

i=  1  i=l  i 
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Corollary  3.1. 

X log  di  <  Xlogc<  -  Plog(E^'/p)' 

plog (^r) ~  plog 2  -  X ^  R^s> ^o) 

<  -  x  log  ^  -  P  log  2  -  e  ^(!L!i— 

Proof :  We  use  the  concavity  of  log  and  Jensen’s  inequality. 

(3-6)  ^  X  log ci  ^  !°s(X  ci/p) =  i°g(X  rf*/p)- 

This  proves  the  second  inequality  of  (3.4).  Now 

(3‘^)  l°gci  =  log(wii^i  +  •  ••  +  Wipdp)  >  wij  log  dy. 

i 

Adding  over  different  %  we  obtain 

X  log  Ci~yi  Wii  log  di  =  X  !og  df ■ 

*  i<3  j 

This  proves  the  first  inequality.  g 

It  can  be  easily  shown  that  the  right  hand  side  of  (3.5)  is  the  minimax  risk: 
Ri(Z2,TDT').  The  left  hand  side  of  (3.5)  gives  an  absolute  bound  for  the  improvement 
by  using  E0.  This  bound  is  attained  if  (3.6)  holds  with  equality,  i.e.  if  a  ■  =  •  •  •  =  cp. 
This  happens  when  aq  =  •  •  •  —  a p.  Therefore  we  expect  that  the  largest  improvement 
occurs  when  E  =  I.  Note  that  when  n  is  small  then  sample  roots  fluctuate  a  great  deal 
and  assuming  that  at-’s  are  nearly  equal  and  replacing  c,-  by  J2di/P  in  (3.6)  seems  too 
optimistic.  On  the  other  hand  if  n  is  not  too  small  the  absolute  lower  bound  for  the  risk 
should  be  reasonable.  Now  since  E0  is  minimax,  its  risk  has  to  approach  the  minimax 
risk  for  some  E.  This  corresponds  to  having  the  equality  in  (3.7)  for  all  i.  This  implies 
that  W[a)  in  Theorem  2.3  is  a  permutation  matrix.  In  the  2-dimensional  case  this  happens 
when  a i /a 2  ->  oo.  In  general  dimensions  it  is  not  easy  to  say  when  W(ck)  approaches  a 


(3.4) 
hence 

(3.5) 
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permutation  matrix  but  we  expect  that  it  corresponds  to  the  case  of  extreme  singularity  of 
A.  Here  we  present  some  Monte  Carlo  results  to  illustrate  these  points. 

First  consider  the  case  £  —  I.  For  p  =  2  and  p  =  5  and  for  selected  values  of  n 
we  list  risk  of  S,  minimax  risk,  risk  of  £0,  and  the  lower  bound  given  in  (3.5).  The  number 
in  parentheses  after  the  minimax  risk  is  its  percentage  to  the  risk  of  S.  The  other  numbers 
in  parentheses  are  percentages  to  the  minimax  risk. 


P  =  2 


n 

Ri(S) 

minimax 

tfi(£o) 

lower  bd. 

2 

2.54 

2.25(88.7) 

2.07(91.8) 

1.97(87.2) 

3 

1.35 

1.23(91.3) 

1.14(92.5) 

1.12(90.5) 

4 

.927 

.862(93.0) 

.808(93.7) 

.798(92.5) 

6 

.571 

.543(95.1) 

.518(95.3) 

.515(94.8) 

10 

.324 

.314(96.9) 

.304(97.0) 

.304(96.8) 

15 

.210 

.206(97.9) 

.202(97.9) 

.201(97.8) 

p  = 

=  5 

n 

Ri{S) 

minimax 

Rv{2o) 

lower  bd. 

5 

5.96 

4.76(79.9) 

3.9(82) 

3.06(64.2) 

6 

3.99 

3.28(82.3) 

2.73(83.2) 

2.41(73.5) 

8 

2.52 

2.17(86.0) 

1.88(86.4) 

1.78(82.0) 

10 

1.87 

1.65(88.5) 

1.47(88.7) 

1.43(86.2) 

15 

1.14 

1.05(92.0) 

.970(92.1) 

.959(91.1) 

The  following  serves  as  a  concise  summary:  (when  £  —  I)  the  ratio  of  the  risk 
of  £0  to  the  minimax  risk  is  roughly  equal  to  the  ratio  of  the  minimax  risk  to  the  risk 
of  S.  Also  note  that  the  absolute  lower  bound  is  realistic  for  n  not  too  close  to  p.  These 
observations  hold  in  our  other  Monte  Carlo  results  as  well. 

For  p  —  5  the  estimator  £q  itself  was  calculated  by  Monte  Carlo.  It  was  found 
that  the  replication  size  for  this  step  need  not  be  too  large  and  the  replication  size  of  100 
was  used.  Actually  the  estimated  risks  for  the  replication  sizes  50,  100,  200,  and  500,  and 
for  n  =  10  were  all  1.47  with  standard  error  about  equal  to  .001  in  each  case.  For  more 
on  this  point  see  Section  4. 
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The  remaining  question  is  how  £  should  be  close  to  being  singular  for  the  risk  to 
approach  the  minimax  risk. 

For  p  —  2,  the  risk  depends  only  on  the  ratio  of  two  population  roots,  say, 
X  =  Xi/X2  (Xi  <  X2).  The  following  table  gives  values  of  the  risk  for  n  —  2,7,  X1/4  = 
.1,.2,...,1.0.  %  means  percentage  of  the  risk  to  the  minimax  risk. 


n  =  2 

n  =  7 

X 

risk 

% 

X 

risk 

% 

1.0000 

2.069 

91.8 

1.0000 

0.4400 

95.9 

0.6561 

2.071 

91.9 

0.6561 

0.4402 

95.9 

0.4096 

2.076 

92.1 

0.4096 

0.4409 

96.1 

0.2401 

2.085 

92.6 

0.2401 

0.4421 

96.3 

0.1296 

2.100 

93.2 

0.1296 

0.4439 

96.7 

0.0625 

2.122 

94.2 

0.0625 

0.4464 

97.3 

0.0256 

2.147 

95.3 

0.0256 

0.4496 

97.9 

0.0081 

2.179 

96.7 

0.0081 

0.4528 

98.7 

0.0016 

2.212 

98.2 

0.0016 

0.4559 

99.3 

0.0001 

2.241 

99.5 

0.0001 

0.4582 

99.8 

We  see  that  the  risk  approaches  the  mini  max  risk  only  when  £  is  very  close 
to  singular.  This  is  a  real  advantage  of  using  £0.  From  our  other  Monte  Carlo  results 
the  above  seems  to  hold  for  general  p,  namely,  £q  is  a  substantial  improvement  over  the 
constant  risk  minimax  estimator  for  wide  range  of  population  covariance  matrices. 

§4  Proofs  and  some  computational  details. 

We  are  going  to  give  some  details  of  the  derivation  of  Therems  2. land  2.2.  First 
we  note  the  following. 

Lemma  4.1. 

(4-!)  £o(A)  =  2  [  TTrDTr'r'dr. 

This  is  straightforward  and  we  omit  the  proof. 
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Proof  of  Theorem  2.1.  By  Lemma  4.1  we  can  represent  the  uniform  distribution  on  0(2) 

by 

(cos  0  —  sin 

sin  0  cos  6 

where  0  is  uniformly  distributed  on  [0,  27r].  Since  TV  is  doubly  stochastic  we  only  need  to 
calculate  W22(°M>  Q!2)=='l,;22(a!i/Q!2)  !)•  Let  a  =  a\ja2  and  A  —  diag(a,  1).  Then  by  (2.17) 


(4.3) 


Let  r'  be  given  by  (4.2).  Then  it  is  easy  to  obtain  Tp 22=V^/v  acos2  ^  +  sbi2  0-  Hence 


(4.4) 


™22 


27 r  Jo 


a  e 


ms2  0 


dO 


a  cos2  0  +  sin2  0 


The  last  equality  is  verylied  using 


(4.5) 


[  dO 
J  1  4-  a  cos2  0 


y/l  +  a 


arc tan  ( - 


tan  0 


WTT 


Proof  of  Theorem  2.2.  We  show  only  some  essential  steps  in  the  derivation  of 
(2.13)  and  (2.14).  (2.15)  is  a  consequence  of  Lemma  2.3.  Because  of  the  scale  invariance  we 
can  set  D— diag(l  —  e  —  d,  1  —  e,  1)  without  loss  of  generality.  Let  r'  =  ( 9ij)i<i,j<3 ■  Then 


(4.6) 


B  =  [bij)  =  r'AT 

(  1  -  ouj\2  -  @g\z 

—  I  — «!7l2<722  —  /3gi3  £723 
V — “312332  —  @9  13033 


\ 

1  —  a022  ~~  0023 

—0^022032  ~  @023933  1  —  a032  ~~  @033 J 


and 


(4.7)  TfDTr '  — 


r(l  —  e  —  d)bn 
(1  -  e  -  d)62i 
\(1  -  e  -  d)b3i 


—dbh/bn  +  (1  —  e)b22 

-^621^31/^11  +  (A  —  e)&32 


-db31/bn  —  ev  +  633/ 


4.  Proofs  and  some  computational  details, 
where 
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«  =  (<>31  4  32)f‘U  *2,VY‘11') 

V&21  022 J  \b32J 

_  &3i ^22  +  &f2fen  —  2b3i 63262i 
&11&22  ~  &21 

-  gi2g323(«  ~  P?  +  gf,<&a2(l  -  /?)  +  gftgjgffil  -  q) 
1  -  a(l  -  p§2)  -  ,5(1  _  grg3)  -f 


(4.8) 


We  used  the  fact 


031  —  ^31  —  (ffl2!?23  —  922ffl3)2, 


because  [gX])  r  r~  (Atj)/|rj  —  (AtJ).  Now  the  denominator  of  v  can  be  written 
as  follows  and  then  can  be  expanded  in  an  infinite  series. 


(4.9) 


1  “(•'*•  £32)  ~  /?(1  ~  <733)  +  OiPg\x 


(1  -  «)(1  -pg\i 


2  /3~  a  2 


1  —  a 


^32)- 


Similarly  l/bn  1/(1  ag\2  Pg%)  can  be  expanded.  To  evaluate  the  integral  we  use  the 
fact  that  {g\x,g\2,  g\i)  is  distributed  according  to  the  Dirichlet  distribution  with  parameters 

(14,  i).  One  more  thing  needed  is  the  following  expectation  S OiimmHtodr.  This 
can  he  evaluated  by  noting  (y,  ,y2,  +  y12y22)3  =  M<1  hence  2  /  y„  y12y21y22(jr  = 

~  hxxQlydr.  | 


For  1 


For  a  special  case  where  two  of  the  roots  coincide  we  have  the  following  result. 


Mh  a,  a)  =  dt  -  ~{(d3  -  d2)(I2  -  1)  +  (d2  -  dx){h  -  1)}, 

(4.10)  a> a )  =  03(1,  a,  a) 

_  a(dt  +  d2)  _  d3  -  di  ,d2-dl2  d3  -  do  „ 

2  2  +  2(a  -  1) (a  h  ~^  +  2{a~  l)^h  ~  ^ 


where  I2{a)  =  h{l/a)fa  and 


h{a)  =  { 


arcsinfl— q)  .  , 

’f  a<1- 

— t4  =  log  ( 

.  1\J ot(ct — 1)  V  \/q— \/ot— 1 ) 


if  a  >  1. 


u 


We  sketch  the  proof  of  this.  Putting  a  =  (3  in  (4.6)  and  (4.8)  and  replacing  a  by  1  —  a  we 
obtain 

fell  =  1  -  (1  -  a)(l  -  gli)  —  a  +  (1  -  a)g2n 
(4.11)  „  _  (<*  ~  1)2(!  ~  g|i)g|l 

l  +  (a-l)rfi  • 

After  polynomial  devision  we  are  left  with  f(a  +  (1  —  a)gj1)~1dr,  /(I  +  (a  —  l)g\x)~xdr 
and  other  terms  are  polynomial  terms  in  gij.  Now  after  appropriate  transformation  we 
obtain 

/<« + <*  -  -w*)-1"1- 1,  a+(-^  - 

a  1 + <“  -  r  ^ = /2<“> 

71}/2  are  given  in  (4.10).  The  rest  of  the  computation  is  straightforward. 

Now  we  discuss  .Monte  Carlo  methods  to  calculate  E0  for  general  dimensionality. 
One  objection  to  the  estimator  might  be  that  it  is  expensive  to  compute  for  large 
p.  However  as  mentioned  in  Section  3  our  Monte  Carlo  results  show  that  the  size  of  the 
replications  to  obtain  the  estimator  need  not  be  too  large  (at  least  from  the  viewpoint 
of  improved  risk).  For  j>  =  5,  50  replications  practically  achieves  the  same  risk  as  -So- 
Uniform  orthogonal  matrices  can  be  generated  by  the  Gram- Schmidt  orthogonalization  of 
columns  of  a  matrix  whose  elements  are  independent  standard  normal  variables.  This  is 
described  in  Chapter  8  of  Lehmann(1959).  Also  note  that  there  is  a  subtle  problem  in  the 
application  of  Monte  Carlo  method:  (i)  either  we  apply  (2.8)  directly  for  A,(ii)  or  we  use 
(2.10)  first  and  apply  (2.8)  for  A  discarding  the  off  diagonal  elements.  From  logical  point  of 
view  (i)  is  legitimate.  Finite  averaging  itself  improves  the  constant  risk  minimax  estimator. 
Therefore  for  the  purpose  of  risk  comparison  this  method  was  used  in  Section  3  for  the  case 
p  —  5  and  £  =  I.  On  the  other  hand  (2.10)  does  not  exactly  hold  with  simulated  uniform 
distribution.  From  practical  viewpoint,  however,  the  latter  seems  to  be  a  reasonable  thing 
to  do.  Another  point  is  that  if  the  size  of  Monte  Carlo  is  not  big  enough,  we  sometimes 
obtain  <f>i  <  even  when  >  otj.  If  this  happens,  either  the  simulation  size  should  be 
increased  or  correction  of  the  ordering  should  be  considered. 


4.  Proofs  and  some  computational  details. 
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